Measuring Effect Sizes: the Effect of Measurement Error
نویسندگان
چکیده
Value-added models in education research allow researchers to explore how a wide variety of policies and measured school inputs affect the academic performance of students. Researchers typically quantify the impacts of such interventions in terms of effect sizes, i.e., the estimated effect of a one standard deviation change in the variable divided by the standard deviation of test scores in the relevant population of students. Effect size estimates based on administrative databases typically are quite small. Research has shown that high quality teachers have large effects on student learning but that measures of teacher qualifications seem to matter little, leading some observers to conclude that, even though effectively choosing teachers can make an important difference in student outcomes, attempting to differentiate teacher candidates based on pre-employment credentials is of little value. This illustrates how the perception that many educational interventions have small effect sizes, as traditionally measured, are having important consequences for policy. In this paper we focus on two issues pertaining to how effect sizes are measured. First, we argue that model coefficients should be compared to the standard deviation of gain scores, not the standard deviation of scores, in calculating most effect sizes. The second issue concerns the need to account for test measurement error. The standard deviation of observed scores in the denominator of the effect-size measure reflects such measurement error as well as the dispersion in the true academic achievement of students, thus overstating variability in achievement. It is the size of an estimated effect relative to the dispersion in the true achievement or the gain in true achievement that is of interest. Adjusting effect-size estimates to account for these considerations is straightforward if one knows the extent of test measurement error. Technical reports provided by test vendors typically only provide information regarding the measurement error associated with the test instrument. However, there are a number of other factors, including variation in scores associated with students having particularly good or bad days, which can result in test scores not accurately reflecting true academic achievement. Using the covariance structure of student test scores across grades in New York City from 1999 to 2007, we estimate the overall extent of test measurement error and how measurement error varies across students. Our estimation strategy follows from two key assumptions: (1) there is no persistence (correlation) in each student’s test measurement error across grades; (2) there is at least some persistence in learning across grades with the degree of persistence constant across grades. Employing the covariance structure of test scores for NYC students and alternative models characterizing the growth in academic achievement, we find estimates of the overall extent of test measurement error to be quite robust. Returning to the analysis of effect sizes, our effect-size estimates based on the dispersion in gain scores net of test measurement error are four times larger than effect sizes typically measured. To illustrate the importance of this difference, we consider results from a recent paper analyzing how various attributes of teachers affect the test-score gains of their students (Boyd et al., in press). Many of the estimated effects appear small when compared to the standard deviation of student achievement – that is effect sizes of less than 0.05. However, when measurement error is taken into account, the associated effect sizes often are about 0.16. Furthermore, when teacher attributes are considered jointly, based on the teacher attribute combinations commonly observed, the overall effect of teacher attributes is roughly half a standard deviation of universe score gains – even larger when teaching experience is also allowed to vary. The bottom line is that there are important differences in teacher effectiveness that are systematically related to observed teacher attributes. Such effects are important from a policy perspective, and should be taken into account in the formulation and implementation of personnel policies.
منابع مشابه
Stress Variations Effect on the Accuracy of Slitting Method for Measuring Residual Stresses
To maintain the structural integrity of the engineering components, having an exact knowledge of residual stresses is important. Among all mechanical strain relief techniques to measure residual stresses, slitting is one of the youngest. This technique relies on the introduction of a narrow slot of increasing depth in a part containing residual stresses. Similar to other measurement techniques,...
متن کاملInvestigating the Effect of Soil Organic Matter on Gypsum Block Calibration for Measuring Soil Volumetric Moisture Content
Continuous soil moisture assessment and evaluation is essential for irrigation management and planning. One of the common methods for measuring the moisture content of the soil is the use of a gypsum block. The use of gypsum blocks in some cases, including the presence of organic matter in soil, may cause an error in the measurement of moisture. For this purpose, a laboratory study was conducte...
متن کاملSimultaneous Monitoring of Multivariate Process Mean and Variability in the Presence of Measurement Error with Linearly Increasing Variance under Additive Covariate Model (RESEARCH NOTE)
In recent years, some researches have been done on simultaneous monitoring of multivariate process mean vector and covariance matrix. However, the effect of measurement error, which exists in many practical applications, on the performance of these control charts is not well studied. In this paper, the effect of measurement error with linearly increasing variance on the performance of ELR contr...
متن کاملThe Effect of Gauge Measurement Capability and Dependency Measure of Process Variables on the MCp
It has been proved that process capability indices provide very efficient measures of the capability of processes from many different perspectives. These indices have been widely used in the manufacturing industry for measuring process reproduction capability according to manufacturing specifications. In the past few years, univariate capability indices have been introduced and used to characte...
متن کاملFragmentation measurement using image processing
In this research, first of all, the existing problems in fragmentation measurement are reviewed for the sake of its fast and reliable evaluation. Then, the available methods used for evaluation of blast results are mentioned. The produced errors especially in recognizing the rock fragments in computer-aided methods, and also, the importance of determination of their sizes in the image analysis ...
متن کامل